c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)
|
|
- Scott Ball
- 6 years ago
- Views:
Transcription
1 Chapter 13 Statistical Parsg Given a corpus of trees, it is easy to extract a CFG and estimate its parameters. Every tree can be thought of as a CFG derivation, and we just perform relative frequency estimation (count and divide) on m. That is, let c(a! Ø) be number of times that rule A! Ø was observed, and n c(a) = X c(a! Ø) (13.1) Ø c(a! Ø) ˆP(A! Ø A) = c(a) (13.2) 13.1 Parser evaluation Evaluation of parsers almost always uses labeled precision and recall or labelled F1 score Black et al., To defe this metric, we make use of notion of a multiset, which is a set where items can occur more than once. If A and B are multisets, defe A(x) to be number of times that x occurs A, and defe A = X x A(x) (13.3) (A \ B)(x) = m{a(x),b(x)} (13.4) We view a tree as a multiset of brackets [X,i, j ] for each node of tree, where X is label of node and w i+1 w j is its span. Note that Penn Treebank style trees, every word is an only child and its parent is a part-of-speech tag. The part-of-speech tag nodes (also called pretermal nodes) are not cluded multiset. Let t (for test) be parser output and g (for gold) be gold-standard tree that we are evaluatg agast. Then defe precision p(t, g ) and recall g (t, g ) to be: p(t, g ) = t \ g t r (t, g ) = t \ g g (13.5) (13.6)
2 Chapter 13. Statistical Parsg 87 and F1 score to be ir harmonic mean: F 1 (t, g ) = p(t,g ) + 1 r (t,g ) = 2 t \ g t + g (13.7) (13.8) The typical setup for English parsg is to tra parser on Penn Treebank, Wall Street Journal sections 02 21, to do development on section 00 or 22, and to test on section 23. If we tra a PCFG without any modifications, we will get an F1 score of only 73%. State-of--art scores are above 90% Markovization A PCFG captures dependency between a parent node and all of its children. On Penn Treebank, this leads to over 10,000 rules, each with its own probability. In practice, it turns out that this tends to be both too little and too much Vertical markovization Too see why it can be too little, suppose our Treebank looked like this Johnson, 1998; Kle and Manng, 2003:
3 Chapter 13. Statistical Parsg times car 10 times with car dog From this we would learn ˆP(! ) = 90/310 (13.9) ˆP(! ) = 10/310 (13.10) and whenever parser is asked to choose between se two trees: (13.11) with car dog
4 Chapter 13. Statistical Parsg 89 (13.12) with dog car it will prefer second one, which was never observed trag data! This can be corrected by modifyg node labels to crease ir sensitivity to ir vertical context, much same way that we can crease context-sensitivity of an n-gram language model by creasg n. We simply annotate each node with its parent s label. For example (assumg that parent of upper is VP): (13.13) [mom = VP] [mom = ] [mom = ] [mom = ] [mom = ] [mom = ] [mom = ] [mom = ] [mom = ] car Now, parser will not be tempted to build a three-level (because it would require an [mom = ] with an [mom = ] child, which is rare). We tra PCFG on se annotated trees, and n after we parse test data, we have to remove annotations before evaluation. This helps accuracy of parser considerably (to about 77% F1) Barization and horizontal markovization On or hand, our PCFG also captures too much dependency. Suppose Treebank contas tree fragment
5 Chapter 13. Statistical Parsg 90 (13.14) JJS tallest steel buildg N America but never contas (13.15) JJS Then parser will fail tryg to parse: (13.16) JJS tallest buildg N America The problem is that if we allow long rules, n re are y possible long rules, which our models says are all dependent. But we believe that re is some relationship between m. The solution is to break down long rules to smaller rules, just as we did to reduce parsg complexity. Here, it s easier to barize trees stead of barizg grammar. For example, to barize (13.14), we troduce new nodes, and annotate each one with children that have been generated so far:
6 Chapter 13. Statistical Parsg 91 (13.17) [prev = ] JJS [prev =,JJS] tallest [prev =,JJS,] steel [prev =,JJS,,] buildg N America Note that re is enough formation annotations to reverse barization. So much formation, fact, that we still can t parse (13.16). We can aga apply an idea from language modelg, this time horizontal direction: make generation of each child depend only on previous (n 1) children Miller et al., 1996; Colls, 1999; Kle and Manng, For example, if n = 2: (13.18) [prev = ] JJS [prev = JJS] tallest [prev = ] steel [prev = ] buildg N America
7 Chapter 13. Statistical Parsg 92 Now we can parse (13.16), and parser accuracy should be a little bit better Usg lguistic knowledge Previously we saw how to crease amount of vertical context dependency a PCFG by changg it, effectively, from a bigram model to a trigram model, and how to decrease amount of horizontal context dependency by changg it, effectively, from a 1-gram model to a bigram model. We can try to use lguistic knowledge to make se context dependencies more telligent Lexicalization In vertical direction, a common technique is lexicalization (sometimes called head-lexicalization to distguish it from anor concept with same name). In English parsg, attachment is one of most difficult ambiguities to resolve, as illustrated by well-known sentence: (13.19) S VP PRP VBD I saw a with a telescope (13.20) S VP PRP VP I VBD saw with a a telescope
8 Chapter 13. Statistical Parsg 93 Although re is a strong general preference for low attachment (13.19), words volved may change this preference. For example, after would have a defite preference for attachg to VP. (13.21) S VP PRP VP I VBD fed after N mogwai midnight Last time, we annotated each node with label of its parent; now, we go opposite direction, annotatg each node with label of one of its leaves. Which one? We choose lguistically most important one, known as its head word, usg some heuristics (e.g., head of a VP is verb; head of an is fal noun). For example, tree (13.21) would become: (13.22) S[head = saw] [head = I] VP[head = fed] PRP[head = I] VP[head = fed] [head = after] I VBD[head = fed] [head = mogwai] [head = after] [head = midnight] fed [head = ] [head = mogwai] after [head = midnight] mogwai midnight What did this buy us? We are gog to learn a high probability for rules like and low probability for rules like VP[head = w]! VP[head = w] [head = after] (13.23) [head = w]! [head = w] [head = after] (13.24) so that we can learn that s headed by after prefer to attach to VPs stead of s. If we barize, it is convenient to barize so that head is generated last (lowest). Thus:
9 Chapter 13. Statistical Parsg 94 (13.25) JJ little house on N (13.26) prairie JJ [left = JJ] little [left = JJ, right = ] house on N prairie Subcategorization In horizontal direction, a common technique is to use subcategorization. The basic idea is that some phrases (called arguments) are required and ors (called adjuncts) are optional: (13.27) Godzilla obliterated city (13.28)? Godzilla obliterated The verb obliterated normally takes a direct object, makg second sentence odd. On or hand, sentences (13.29) Godzilla exists (13.30) * Godzilla exists monster verb exists never takes a direct object. By contrast, adjuncts can occur much more freely: (13.31) Godzilla exists today (13.32) Godzilla obliterated city today This can affect parsg decisions. For example,
10 Chapter 13. Statistical Parsg 95 (13.33) I saw her duck (13.34) I obliterated her duck The first sentence is ambiguous for hus because saw can take eir an or an S as an argument. The second sentence is unambiguous for hus, but ambiguous for computers unless y learn that obliterated must take an argument, not an S argument. Last time, we made generation of a child node depend on one previous child. Now, we would like to use this same mechanism to control number of arguments, dependg on verb. We can do this by makg generation of a child node depend on all of previous arguments, and none of previous adjuncts. I ve left off some annotations to save space: (13.35) S VP N VBD Godzilla obliterated city today (13.36) S[head = obliterated] VP[head = obliterated] N VP[head = obliterated] Godzilla VP[head = obliterated, right = ] [head = city, arg] VBD[head = obliterated] [head = city] today obliterated city We marked [ city] with an arg feature to dicate that it is an argument, not an adjunct. Moreover, right feature, and left feature if re were one, only keeps track of previous arguments, not adjuncts Smoothg With complex nontermals we have been creatg, it may become hard to reliably estimate rule probabilities from data. The solution is to apply smoothg, as language modelg. Witten-Bell smooth-
11 Chapter 13. Statistical Parsg 96 g is a fairly common choice parsg. For example, to estimate probability of VP[head = obliterated]! VP[head = obliterated, right = ] [head = city, arg] we might terpolate its relative-frequency estimate with that of VP[head = w]! VP[head = w,right = ] [head = city,arg] where we have replaced word obliterated with a placeholder w to make rule probability easier to estimate. If we test our parser on unseen data, it is evitable that it will encounter unseen words. If we don t do anythg about it, parser will simply reject any strg that has an unknown word, which is obviously bad. The simplest thg to do is to simulate unknown words trag data. That is, trag data, replace every word that occurs only once (or k times) with a special symbol <unk>. Then tra PCFG as usual. Then, test data, replace all unknown words with <unk>. It s also fe to use multiple unknown symbols. For example, we can replace words endg -g with <unk-g>. A more sophisticated approach would be to apply some of ideas that we saw language modelg Beam search The Viterbi CKY algorithm can be slow, especially if modifications to grammar crease nontermal alphabet a lot. We can use beam search to speed up search if we are willg to allow potential search errors. After completion of each chart cell best[i, j ], do followg: 1: for all X 2 N do 2: score[x ] best[i, j ][X ] h(x ) 3: end for 4: choose mscore 5: for all X 2 N do 6: if score[x ] < mscore n 7: end if 8: delete best[i, j ][X ] 9: delete back[i, j ][X ] 10: end for The function h(x ) is called a heuristic function and is meant to estimate relative probability of gettg from S at root down to X. The typical thg to do is to let h(x ) be frequency of X trag data. There are two common ways of choosg mscore (le 4): µ mscore = max score[x ] Ø, where 0 < Ø < 1 (typical values: 10 3 to 10 5 ) X mscore is score of b th best member of score (typical values of b: ) It is also fe to set mscore to larger of se two values.
12 Chapter 13. Statistical Parsg 97 Question The time complexity of CKY is normally O(n 3 N 3 ), because we have to loop over i, j,k, X,Y, and Z. If we add beam search, what will time complexity be terms of n and b? Assume b < N.
13 Bibliography Black, E. et al. (1991). A procedure for quantitatively comparg syntactic coverage of English grammars. In: Proc. DARPA Speech and Natural Language Workshop, pp Colls, Michael (1999). Head-Driven Statistical Models for Natural Language Parsg. PhD sis. University of Pennsylvania. Johnson, Mark (1998). PCFG models of lguistic tree representations. In: Computational Lguistics 24, pp Kle, Dan and Christopher D. Manng (2003). Accurate Unlexicalized Parsg. In: Proc. ACL, pp Miller, Scott et al. (1996). A Fully Statistical Approach to Natural Language Interfaces. In: Proc. ACL, pp
A Context-Free Grammar
Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationChapter 14 (Partially) Unsupervised Parsing
Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning
Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic
More informationPenn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark
Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationProbabilistic Context-Free Grammars. Michael Collins, Columbia University
Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationS NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.
/6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences
More informationFeatures of Statistical Parsers
Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationIn this chapter, we explore the parsing problem, which encompasses several questions, including:
Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars
More informationA* Search. 1 Dijkstra Shortest Path
A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the
More information2.4 Parsing. Computer Science 332. Compiler Construction. Chapter 2: A Simple One-Pass Compiler : Parsing. Top-Down Parsing
Computer Science 332 Compiler Construction Chapter 2: A Simple One-Pass Compiler 2.4-2.5: Parsg 2.4 Parsg Parsg : the process of determg whether a strg S is generated by a grammar G Short answer is yes/no
More informationDT2118 Speech and Speaker Recognition
DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language
More informationNatural Language Processing 1. lecture 7: constituent parsing. Ivan Titov. Institute for Logic, Language and Computation
atural Language Processing 1 lecture 7: constituent parsing Ivan Titov Institute for Logic, Language and Computation Outline Syntax: intro, CFGs, PCFGs PCFGs: Estimation CFGs: Parsing PCFGs: Parsing Parsing
More informationTnT Part of Speech Tagger
TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationQuiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7
Quiz 1, COMS 4705 Name: 10 30 30 20 Good luck! 4705 Quiz 1 page 1 of 7 Part #1 (10 points) Question 1 (10 points) We define a PCFG where non-terminal symbols are {S,, B}, the terminal symbols are {a, b},
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationRecap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees
Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III
More informationAttendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.
even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationParsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22
Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky
More informationMultilevel Coarse-to-Fine PCFG Parsing
Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationN-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24
L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,
More informationAlessandro Mazzei MASTER DI SCIENZE COGNITIVE GENOVA 2005
Alessandro Mazzei Dipartimento di Informatica Università di Torino MATER DI CIENZE COGNITIVE GENOVA 2005 04-11-05 Natural Language Grammars and Parsing Natural Language yntax Paolo ama Francesca yntactic
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationCS 6120/CS4120: Natural Language Processing
CS 6120/CS4120: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Assignment/report submission
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationStatistical methods in NLP, lecture 7 Tagging and parsing
Statistical methods in NLP, lecture 7 Tagging and parsing Richard Johansson February 25, 2014 overview of today's lecture HMM tagging recap assignment 3 PCFG recap dependency parsing VG assignment 1 overview
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationLecture 5: UDOP, Dependency Grammars
Lecture 5: UDOP, Dependency Grammars Jelle Zuidema ILLC, Universiteit van Amsterdam Unsupervised Language Learning, 2014 Generative Model objective PCFG PTSG CCM DMV heuristic Wolff (1984) UDOP ML IO K&M
More informationParsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing
L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the
More informationParsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.
: Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.
More informationNLP Programming Tutorial 11 - The Structured Perceptron
NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationDecoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationStructured Output Prediction: Generative Models
Structured Output Prediction: Generative Models CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 17.3, 17.4, 17.5.1 Structured Output Prediction Supervised
More informationCS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationProcessing/Speech, NLP and the Web
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[
More informationContext- Free Parsing with CKY. October 16, 2014
Context- Free Parsing with CKY October 16, 2014 Lecture Plan Parsing as Logical DeducBon Defining the CFG recognibon problem BoHom up vs. top down Quick review of Chomsky normal form The CKY algorithm
More informationIntroduction to Probablistic Natural Language Processing
Introduction to Probablistic Natural Language Processing Alexis Nasr Laboratoire d Informatique Fondamentale de Marseille Natural Language Processing Use computers to process human languages Machine Translation
More informationDiscovery of Definition Patterns by Compressing Dictionary Sentences
Discovery of Defition Patterns by Compressg Dictionary Sentences Masatoshi Tsuchiya, Sadao Kurohashi, Satoshi Sato tsuchiya@pe.kuee.kyoto-u.ac.jp, kuro@kc.t.u-tokyo.ac.jp, sato@i.kyoto-u.ac.jp Graduate
More informationProbabilistic Context-Free Grammar
Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612
More informationMarrying Dynamic Programming with Recurrent Neural Networks
Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying
More informationNLP Homework: Dependency Parsing with Feed-Forward Neural Network
NLP Homework: Dependency Parsing with Feed-Forward Neural Network Submission Deadline: Monday Dec. 11th, 5 pm 1 Background on Dependency Parsing Dependency trees are one of the main representations used
More informationHMM and Part of Speech Tagging. Adam Meyers New York University
HMM and Part of Speech Tagging Adam Meyers New York University Outline Parts of Speech Tagsets Rule-based POS Tagging HMM POS Tagging Transformation-based POS Tagging Part of Speech Tags Standards There
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationComputational Linguistics
Computational Linguistics Dependency-based Parsing Clayton Greenberg Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2016 Acknowledgements These slides
More informationThe relation of surprisal and human processing
The relation of surprisal and human processing difficulty Information Theory Lecture Vera Demberg and Matt Crocker Information Theory Lecture, Universität des Saarlandes April 19th, 2015 Information theory
More informationProbabilistic Context Free Grammars
1 Defining PCFGs A PCFG G consists of Probabilistic Context Free Grammars 1. A set of terminals: {w k }, k = 1..., V 2. A set of non terminals: { i }, i = 1..., n 3. A designated Start symbol: 1 4. A set
More informationDual Decomposition for Natural Language Processing. Decoding complexity
Dual Decomposition for atural Language Processing Alexander M. Rush and Michael Collins Decoding complexity focus: decoding problem for natural language tasks motivation: y = arg max y f (y) richer model
More informationContext-Free Grammars and Languages. Reading: Chapter 5
Context-Free Grammars and Languages Reading: Chapter 5 1 Context-Free Languages The class of context-free languages generalizes the class of regular languages, i.e., every regular language is a context-free
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationComputational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing
Computational Linguistics Dependency-based Parsing Dietrich Klakow & Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2013 Acknowledgements These slides
More informationLING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars
LING 473: Day 10 START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars 1 Issues with Projects 1. *.sh files must have #!/bin/sh at the top (to run on Condor) 2. If run.sh is supposed
More informationRecap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP
Recap: Language models Foundations of atural Language Processing Lecture 4 Language Models: Evaluation and Smoothing Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipp
More informationPart-of-Speech Tagging
Part-of-Speech Tagging Informatics 2A: Lecture 17 Adam Lopez School of Informatics University of Edinburgh 27 October 2016 1 / 46 Last class We discussed the POS tag lexicon When do words belong to the
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More informationCMPT-825 Natural Language Processing. Why are parsing algorithms important?
CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate
More informationSemantic Role Labeling and Constrained Conditional Models
Semantic Role Labeling and Constrained Conditional Models Mausam Slides by Ming-Wei Chang, Nick Rizzolo, Dan Roth, Dan Jurafsky Page 1 Nice to Meet You 0: 2 ILP & Constraints Conditional Models (CCMs)
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew
More informationA Deterministic Word Dependency Analyzer Enhanced With Preference Learning
A Deterministic Word Dependency Analyzer Enhanced With Preference Learning Hideki Isozaki and Hideto Kazawa and Tsutomu Hirao NTT Communication Science Laboratories NTT Corporation 2-4 Hikaridai, Seikacho,
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationReview. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering
Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up
More informationSequence Labeling: HMMs & Structured Perceptron
Sequence Labeling: HMMs & Structured Perceptron CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu HMM: Formal Specification Q: a finite set of N states Q = {q 0, q 1, q 2, q 3, } N N Transition
More informationLecture 9: Hidden Markov Model
Lecture 9: Hidden Markov Model Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501 Natural Language Processing 1 This lecture v Hidden Markov
More informationTo make a grammar probabilistic, we need to assign a probability to each context-free rewrite
Notes on the Inside-Outside Algorithm To make a grammar probabilistic, we need to assign a probability to each context-free rewrite rule. But how should these probabilities be chosen? It is natural to
More informationConstituency Parsing
CS5740: Natural Language Processing Spring 2017 Constituency Parsing Instructor: Yoav Artzi Slides adapted from Dan Klein, Dan Jurafsky, Chris Manning, Michael Collins, Luke Zettlemoyer, Yejin Choi, and
More informationLagrangian Relaxation Algorithms for Inference in Natural Language Processing
Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David
More informationHandout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0
Massachusetts Institute of Technology 6.863J/9.611J, Natural Language Processing, Spring, 2001 Department of Electrical Engineering and Computer Science Department of Brain and Cognitive Sciences Handout
More informationA Support Vector Method for Multivariate Performance Measures
A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme
More informationThe SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania
The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania 1 PICTURE OF ANALYSIS PIPELINE Tokenize Maximum Entropy POS tagger MXPOST Ratnaparkhi Core Parser Collins
More informationCS 545 Lecture XVI: Parsing
CS 545 Lecture XVI: Parsing brownies_choco81@yahoo.com brownies_choco81@yahoo.com Benjamin Snyder Parsing Given a grammar G and a sentence x = (x1, x2,..., xn), find the best parse tree. We re not going
More informationA Supertag-Context Model for Weakly-Supervised CCG Parser Learning
A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette Chris Dyer Jason Baldridge Noah A. Smith U. Washington CMU UT-Austin CMU Contributions 1. A new generative model for learning
More informationLog-Linear Models, MEMMs, and CRFs
Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationSoft Inference and Posterior Marginals. September 19, 2013
Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference
More informationAlgorithms for Syntax-Aware Statistical Machine Translation
Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial
More informationProbabilistic Linguistics
Matilde Marcolli MAT1509HS: Mathematical and Computational Linguistics University of Toronto, Winter 2019, T 4-6 and W 4, BA6180 Bernoulli measures finite set A alphabet, strings of arbitrary (finite)
More informationStatistical Machine Translation
Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding
More informationCISC4090: Theory of Computation
CISC4090: Theory of Computation Chapter 2 Context-Free Languages Courtesy of Prof. Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Spring, 2014 Overview In Chapter
More informationLanguage Modeling. Michael Collins, Columbia University
Language Modeling Michael Collins, Columbia University Overview The language modeling problem Trigram models Evaluating language models: perplexity Estimation techniques: Linear interpolation Discounting
More informationINF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models
1 University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Language Models & Hidden Markov Models Stephan Oepen & Erik Velldal Language
More information